Goto

Collaborating Authors

 activity prediction


AppendixofFunctionallyRegionalizedKnowledge TransferforLow-resourceDrugDiscovery

Neural Information Processing Systems

For FC-individual, we train each testing assay separately with a two-layer fully-connected base learner. For FC-All, a two-layer fully connected model is trained on samples from both support setandquery setofsource assays andfrom thesupport setofthetargetassay. C.1 DrugActivityPredictionData For drug activity prediction, here we summarized the number of assays belonging to each target family: GPCR (685), Ion channel (215), Kinase (665), NHR (123), Binding (2523), Phenotypic (2299), Functional (1689), Proteinase (289),ADME (55).




AppendixofFunctionallyRegionalizedKnowledge TransferforLow-resourceDrugDiscovery

Neural Information Processing Systems

For FC-individual, we train each testing assay separately with a two-layer fully-connected base learner. For FC-All, a two-layer fully connected model is trained on samples from both support setandquery setofsource assays andfrom thesupport setofthetargetassay. C.1 DrugActivityPredictionData For drug activity prediction, here we summarized the number of assays belonging to each target family: GPCR (685), Ion channel (215), Kinase (665), NHR (123), Binding (2523), Phenotypic (2299), Functional (1689), Proteinase (289),ADME (55).



Low-N Protein Activity Optimization with FolDE

arXiv.org Artificial Intelligence

Proteins are traditionally optimized through the costly construction and measurement of many mutants. Active Learning-assisted Directed Evolution (ALDE) alleviates that cost by predicting the best improvements and iteratively testing mutants to inform predictions. However, existing ALDE methods face a critical limitation: selecting the highest-predicted mutants in each round yields homogeneous training data insufficient for accurate prediction models in subsequent rounds. Here we present FolDE, an ALDE method designed to maximize end-of-campaign success. In simulations across 20 protein targets, FolDE discovers 23% more top 10% mutants than the best baseline ALDE method (p=0.005) and is 55% more likely to find top 1% mutants. FolDE achieves this primarily through naturalness-based warm-starting, which augments limited activity measurements with protein language model outputs to improve activity prediction. We also introduce a constant-liar batch selector, which improves batch diversity; this is important in multi-mutation campaigns but had limited effect in our benchmarks. The complete workflow is freely available as open-source software, making efficient protein optimization accessible to any laboratory.


Leveraging Data Augmentation and Siamese Learning for Predictive Process Monitoring

arXiv.org Artificial Intelligence

Predictive Process Monitoring (PPM) enables forecasting future events or outcomes of ongoing business process instances based on event logs. However, deep learning PPM approaches are often limited by the low variability and small size of real-world event logs. To address this, we introduce SiamSA-PPM, a novel self-supervised learning framework that combines Siamese learning with Statistical Augmentation for Predictive Process Monitoring. It employs three novel statistically grounded transformation methods that leverage control-flow semantics and frequent behavioral patterns to generate realistic, semantically valid new trace variants. These augmented views are used within a Siamese learning setup to learn generalizable representations of process prefixes without the need for labeled supervision. Extensive experiments on real-life event logs demonstrate that SiamSA-PPM achieves competitive or superior performance compared to the SOTA in both next activity and final outcome prediction tasks. Our results further show that statistical augmentation significantly outperforms random transformations and improves variability in the data, highlighting SiamSA-PPM as a promising direction for training data enrichment in process prediction.


PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding (Supplementary Material)

Neural Information Processing Systems

For example, the feature of dipeptide " st " is defined by its dipeptide composition ( The Moran feature descriptor defines the distribution of amino acid properties along a protein sequence. It should be noted that there are evident class imbalances in two multi-class classification tasks. Table 1: Balanced metric (weighted F1) compared with accuracy on multi-class classification tasks. We report mean (std) for each experiment. Used as a feature extractor with pre-trained weights frozen.


DailyLLM: Context-Aware Activity Log Generation Using Multi-Modal Sensors and LLMs

arXiv.org Artificial Intelligence

Rich and context-aware activity logs facilitate user behavior analysis and health monitoring, making them a key research focus in ubiquitous computing. The remarkable semantic understanding and generation capabilities of Large Language Models (LLMs) have recently created new opportunities for activity log generation. However, existing methods continue to exhibit notable limitations in terms of accuracy, efficiency, and semantic richness. To address these challenges, we propose DailyLLM. To the best of our knowledge, this is the first log generation and summarization system that comprehensively integrates contextual activity information across four dimensions: location, motion, environment, and physiology, using only sensors commonly available on smartphones and smartwatches. To achieve this, DailyLLM introduces a lightweight LLM-based framework that integrates structured prompting with efficient feature extraction to enable high-level activity understanding. Extensive experiments demonstrate that DailyLLM outperforms state-of-the-art (SOTA) log generation methods and can be efficiently deployed on personal computers and Raspberry Pi. Utilizing only a 1.5B-parameter LLM model, DailyLLM achieves a 17% improvement in log generation BERTScore precision compared to the 70B-parameter SOTA baseline, while delivering nearly 10x faster inference speed.


RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes

arXiv.org Artificial Intelligence

--Next activity prediction represents a fundamental challenge for optimizing business processes in service-oriented architectures such as microservices environments, distributed enterprise systems, and cloud-native platforms, which enables proactive resource allocation and dynamic service composition. Despite the prevalence of sequence-based methods, these approaches fail to capture non-sequential relationships that arise from parallel executions and conditional dependencies. Even though graph-based approaches address structural preservation, they suffer from homogeneous representations and static structures that apply uniform modeling strategies regardless of individual process complexity characteristics. T o address these limitations, we introduce RLHGNN, a novel framework that transforms event logs into heterogeneous process graphs with three distinct edge types grounded in established process mining theory. Our approach creates four flexible graph structures by selectively combining these edges to accommodate different process complexities, and employs reinforcement learning formulated as a Markov Decision Process to automatically determine the optimal graph structure for each specific process instance. RLHGNN then applies heterogeneous graph convolution with relation-specific aggregation strategies to effectively predict the next activity. This adaptive methodology enables precise modeling of both sequential and non-sequential relationships in service interactions. Comprehensive evaluation on six real-world datasets demonstrates that RLHGNN consistently outperforms state-of-the-art approaches. Furthermore, it maintains an inference latency of approximately 1 ms per prediction, representing a highly practical solution suitable for real-time business process monitoring applications. Service-oriented architectures have fundamentally transformed modern business process implementation, which enables distributed services to coordinate through well-defined interfaces for delivering substantial business value [1], [2]. Jiaxing Wang, Yifeng Y u, Jiahan Song, Bin Cao, and Jing Fan are with the College of Computer Science and Technology, Zhejiang University of Technology, 310023, Hangzhou, China, and also with Zhejiang Key Laboratory of Visual Information Intelligent Processing, 310023, Hangzhou, China (email: wjx@zjut.edu.cn,